forethought and hindsight
Forethought and Hindsight in Credit Assignment
We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions. Particularly, we work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We establish the relative merits, limitations and complementary properties of both planning mechanisms in carefully constructed scenarios. Further, we investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated. Lastly, we discuss the issue of model estimation and highlight a spectrum of methods that stretch from environment dynamics predictors to planner-aware models.
Review for NeurIPS paper: Forethought and Hindsight in Credit Assignment
Additional Feedback: major points: lines 234-246: what is "fan in" and "fan out", this paragraph doesn't explain what it is? Does it relate to neural network architecture (shown in Figure 1)? I have no idea what "large fan-in" and "small fan-out" means. Is channeling a bottleneck in the state space? If so, where is this?
Forethought and Hindsight in Credit Assignment
We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions. Particularly, we work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We establish the relative merits, limitations and complementary properties of both planning mechanisms in carefully constructed scenarios. Further, we investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated. Lastly, we discuss the issue of model estimation and highlight a spectrum of methods that stretch from environment dynamics predictors to planner-aware models.